AITopics | dataset and model

Collaborating Authors

dataset and model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond

Wang, Dingzirui, Zhang, Xuanliang, Xu, Keyan, Zhu, Qingfu, Che, Wanxiang, Deng, Yang

arXiv.org Artificial IntelligenceSep-26-2025

Existing research indicates that the output of Chain-of-Thought (CoT) is significantly affected by input perturbations. Although many methods aim to mitigate such impact by optimizing prompts, a theoretical explanation of how these perturbations influence CoT outputs remains an open area of research. This gap limits our in-depth understanding of how input perturbations propagate during the reasoning process and hinders further improvements in prompt optimization methods. Therefore, in this paper, we theoretically analyze the effect of input perturbations on the fluctuation of CoT outputs. We first derive an upper bound for input perturbations under the condition that the output fluctuation is within an acceptable range, based on which we prove that: (i) This upper bound is positively correlated with the number of reasoning steps in the CoT; (ii) Even an infinitely long reasoning process cannot eliminate the impact of input perturbations. We then apply these conclusions to the Linear Self-Attention (LSA) model, which can be viewed as a simplified version of the Transformer. For the LSA model, we prove that the upper bound for input perturbation is negatively correlated with the norms of the input embedding and hidden state vectors. To validate this theoretical analysis, we conduct experiments on three mainstream datasets and four mainstream models. The experimental results align with our theoretical analysis, empirically demonstrating the correctness of our findings.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.21284

Country:

Asia > Middle East (0.46)
North America > Mexico (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

torchmil: A PyTorch-based library for deep Multiple Instance Learning

Castro-Macías, Francisco M., Sáez-Maldonado, Francisco J., Morales-Álvarez, Pablo, Molina, Rafael

arXiv.org Artificial IntelligenceSep-11-2025

Multiple Instance Learning (MIL) is a powerful framework for weakly supervised learning, particularly useful when fine-grained annotations are unavailable. Despite growing interest in deep MIL methods, the field lacks standardized tools for model development, evaluation, and comparison, which hinders reproducibility and accessibility. To address this, we present torchmil, an open-source Python library built on PyTorch. torchmil offers a unified, modular, and extensible framework, featuring basic building blocks for MIL models, a standardized data format, and a curated collection of benchmark datasets and models. The library includes comprehensive documentation and tutorials to support both practitioners and researchers. torchmil aims to accelerate progress in MIL and lower the entry barrier for new users. Available at https://torchmil.readthedocs.io.

artificial intelligence, machine learning, torchmil, (14 more...)

arXiv.org Artificial Intelligence

2509.08129

Country: Europe > Spain > Andalusia (0.15)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.70)
Health & Medicine > Diagnostic Medicine > Imaging (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Neural Information Processing SystemsMay-27-2025, 22:11:07 GMT

The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graphtopology, and are thus limited to small graphs with only a few dozens of nodes. In contrast, human experts typically write programs based on popular libraries for task solving, and can thus handle graphs with different scales. To this end, a question naturally arises: can LLMs analyze graphs like professionals?

benchmark, dataset and model, language model analyze graph, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.38)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLM-PQA: LLM-enhanced Prediction Query Answering

Li, Ziyu, Zhao, Wenjie, Katsifodimos, Asterios, Hai, Rihan

arXiv.org Artificial IntelligenceSep-2-2024

The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.

dataset, llm-pqa, query, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3627673.3679210

2409.0114

Country:

Europe > Netherlands > South Holland > Delft (0.06)
North America > United States > Idaho > Ada County > Boise (0.06)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Racial/Ethnic Categories in AI and Algorithmic Fairness: Why They Matter and What They Represent

Mickel, Jennifer

arXiv.org Artificial IntelligenceApr-10-2024

Racial diversity has become increasingly discussed within the AI The utilization of racial and ethnic categories in the development and algorithmic fairness literature, yet little attention is focused on of datasets and models facilitates the inclusion and documentation justifying the choices of racial categories and understanding how of diverse perspectives. Racial and ethnic categories are especially people are racialized into these chosen racial categories. Even less crucial for datasets and models in which race and ethnicity attention is given to how racial categories shift and how the racialization serve as relevant factors, may act as confounding variables, or enable process changes depending on the context of a dataset or the ability to audit for fairness using race and ethnicity for model. An unclear understanding of who comprises the racial categories fairness purposes. For example, understanding the racial and/or chosen and how people are racialized into these categories ethnic target of hate speech is crucial for understanding the impact can lead to varying interpretations of these categories. These varying of hate speech, as hate speech can differ based on the race interpretations can lead to harm when the understanding of and/or ethnicity of the target[48]. Similarly, in health, race is correlated racial categories and the racialization process is misaligned from with health outcomes[6], and knowledge of a patient's race the actual racialization process and racial categories used. Harm and ethnicity can help contextualize the patient's experience and can also arise if the racialization process and racial categories used health history[53]. In algorithmic fairness settings, knowledge of are irrelevant ordonot exist inthecontext they areapplied.

category, cultural context, racial category, (16 more...)

arXiv.org Artificial Intelligence

2404.06717

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
Oceania > Australia (0.05)
North America > United States > Florida > Broward County (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry:

Law (0.93)
Health & Medicine > Consumer Health (0.86)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Question Answering for Electronic Health Records: A Scoping Review of datasets and models

Bardhan, Jayetri, Roberts, Kirk, Wang, Daisy Zhe

arXiv.org Artificial IntelligenceNov-7-2023

Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the medical record of the patient. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering. This study aimed to provide a methodological review of existing works on QA over EHRs. We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.

dataset and model, electronic health record, scoping review

arXiv.org Artificial Intelligence

2310.08759

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.80)

Add feedback

NLPositionality: Characterizing Design Biases of Datasets and Models

Santy, Sebastin, Liang, Jenny T., Bras, Ronan Le, Reinecke, Katharina, Sap, Maarten

arXiv.org Artificial IntelligenceJun-2-2023

Design biases in NLP systems, such as performance differences for different populations, often stem from their creator's positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases, they are hard to quantify because researcher, system, and dataset positionality is often unobserved. We introduce NLPositionality, a framework for characterizing design biases and quantifying the positionality of NLP datasets and models. Our framework continuously collects annotations from a diverse pool of volunteer participants on LabintheWild, and statistically quantifies alignment with dataset labels and model predictions. We apply NLPositionality to existing datasets and models for two tasks -- social acceptability and hate speech detection. To date, we have collected 16,299 annotations in over a year for 600 instances from 1,096 annotators across 87 countries. We find that datasets and models align predominantly with Western, White, college-educated, and younger populations. Additionally, certain groups, such as non-binary people and non-native English speakers, are further marginalized by datasets and models as they rank least in alignment across all tasks. Finally, we draw from prior literature to discuss how researchers can examine their own positionality and that of their datasets and models, opening the door for more inclusive NLP systems.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.01943

Country:

Asia > India (0.04)
Europe > Germany (0.04)
Asia > Middle East > Jordan (0.04)
(109 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (0.93)
Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Human Computer Interaction (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities

Tonja, Atnafu Lambebo, Belay, Tadesse Destaw, Azime, Israel Abebe, Ayele, Abinew Ali, Mehamed, Moges Ahmed, Kolesnikova, Olga, Yimam, Seid Muhie

arXiv.org Artificial IntelligenceMar-25-2023

This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia. Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This repository can be updated periodically with contributions from other researchers. Our objective is to identify research gaps and disseminate the information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.

ethiopian language, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.14406

Country:

Asia > Middle East > Israel (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Africa > Ethiopia > Southern Nations, Nationalities, and Peoples' Region > Hawassa (0.04)
(14 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation

Hu, Qiang, Guo, Yuejun, Xie, Xiaofei, Cordy, Maxime, Ma, Lei, Papadakis, Mike, Traon, Yves Le

arXiv.org Artificial IntelligenceFeb-3-2023

Deep learning (DL) plays a more and more important role in our daily life due to its competitive performance in industrial application domains. As the core of DL-enabled systems, deep neural networks (DNNs) need to be carefully evaluated to ensure the produced models match the expected requirements. In practice, the \emph{de facto standard} to assess the quality of DNNs in the industry is to check their performance (accuracy) on a collected set of labeled test data. However, preparing such labeled data is often not easy partly because of the huge labeling effort, i.e., data labeling is labor-intensive, especially with the massive new incoming unlabeled data every day. Recent studies show that test selection for DNN is a promising direction that tackles this issue by selecting minimal representative data to label and using these data to assess the model. However, it still requires human effort and cannot be automatic. In this paper, we propose a novel technique, named \textit{Aries}, that can estimate the performance of DNNs on new unlabeled data using only the information obtained from the original test data. The key insight behind our technique is that the model should have similar prediction accuracy on the data which have similar distances to the decision boundary. We performed a large-scale evaluation of our technique on two famous datasets, CIFAR-10 and Tiny-ImageNet, four widely studied DNN models including ResNet101 and DenseNet121, and 13 types of data transformation methods. Results show that the estimated accuracy by \textit{Aries} is only 0.03\% -- 2.60\% off the true accuracy. Besides, \textit{Aries} also outperforms the state-of-the-art labeling-free methods in 50 out of 52 cases and selection-labeling-based methods in 96 out of 128 cases.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2207.10942

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language

Jansen, Peter, Boyd-Graber, Jordan

arXiv.org Artificial IntelligenceOct-14-2022

Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as "Darmok and Jalad at Tanagra" instead of "We should work together." This work assembles a Tamarian-English dictionary of utterances from the original episode and several follow-on novels, and uses this to construct a parallel corpus of 456 English-Tamarian utterances. A machine translation system based on a large language model (T5) is trained using this parallel corpus, and is shown to produce an accuracy of 76% when translating from English to Tamarian on known utterances.

artificial intelligence, natural language, utterance, (17 more...)

arXiv.org Artificial Intelligence

2107.08146

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Arizona (0.05)
North America > United States > New York > Kings County > New York City (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback